🕸️ Ada Research Browser

istio-mtls-failure.md
← Back

Runbook: Istio mTLS Failure

Alert

Severity

Critical -- mTLS failures break service-to-service communication within the mesh. In STRICT mode (the SRE platform default), any service without a valid Istio sidecar proxy will be unable to communicate with mesh services.

Impact

Investigation Steps

  1. Check Istiod (control plane) status:
kubectl get pods -n istio-system -l app=istiod
kubectl logs -n istio-system deployment/istiod --tail=100
  1. Check the PeerAuthentication policy:
kubectl get peerauthentication -A
  1. Verify mTLS mode for a specific namespace:
kubectl get peerauthentication -n <namespace> -o yaml
  1. Check proxy status for a failing pod:
istioctl proxy-status
  1. If istioctl is not available, check the sidecar proxy logs:
kubectl logs <pod-name> -n <namespace> -c istio-proxy --tail=100
  1. Check for TLS handshake errors:
kubectl logs <pod-name> -n <namespace> -c istio-proxy --tail=200 | grep -i "tls\|handshake\|ssl\|certificate"
  1. Check if the destination service has sidecar injection enabled:
kubectl get namespace <namespace> --show-labels | grep istio-injection
  1. Verify the sidecar is present on both source and destination pods:
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].name}'
  1. Check Istio DestinationRules that might override mTLS settings:
kubectl get destinationrules -A
kubectl get destinationrules -A -o yaml | grep -B 5 -A 5 "tls"
  1. Check the Istio HelmRelease status:
flux get helmrelease istio-base -n istio-system
flux get helmrelease istiod -n istio-system
  1. Verify certificates are valid in the proxy:
kubectl exec <pod-name> -n <namespace> -c istio-proxy -- openssl s_client -connect <destination-svc>.<destination-ns>.svc.cluster.local:<port> -tls1_2 2>/dev/null | openssl x509 -noout -dates

Resolution

Pod missing Istio sidecar

  1. Verify the namespace has the injection label:
kubectl get namespace <namespace> -o jsonpath='{.metadata.labels.istio-injection}'
  1. If missing, add the label:
kubectl label namespace <namespace> istio-injection=enabled --overwrite
  1. Restart the pods to inject the sidecar:
kubectl rollout restart deployment <name> -n <namespace>

mTLS failing between namespaces

  1. Check if both namespaces have STRICT PeerAuthentication:
kubectl get peerauthentication -n <source-namespace>
kubectl get peerauthentication -n <destination-namespace>
  1. Ensure no DestinationRule is disabling mTLS:
kubectl get destinationrules -n <namespace> -o yaml | grep -A 10 "trafficPolicy"
  1. If a service needs to accept plaintext (e.g., external health check), create a permissive PeerAuthentication for that specific port:
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: allow-plaintext-health
  namespace: <namespace>
spec:
  selector:
    matchLabels:
      app: <app-name>
  portLevelMtls:
    8080:
      mode: PERMISSIVE

Platform namespace communication (Istio injection disabled)

Platform namespaces (kube-system, monitoring, logging, kyverno, etc.) do not have Istio injection enabled. If a platform service needs to reach a mesh service:

  1. Create a DestinationRule to disable mTLS for that specific service:
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: <service>-plaintext
  namespace: <mesh-namespace>
spec:
  host: <service>.<mesh-namespace>.svc.cluster.local
  trafficPolicy:
    tls:
      mode: DISABLE
  1. Or set the PeerAuthentication to PERMISSIVE for that service

Istiod certificate rotation failure

  1. Check Istiod logs for certificate errors:
kubectl logs -n istio-system deployment/istiod --tail=200 | grep -i "cert\|ca\|root"
  1. Check if the Istio root CA secret exists:
kubectl get secret istio-ca-secret -n istio-system
  1. If certificates are expired, restart Istiod to trigger re-issuance:
kubectl rollout restart deployment istiod -n istio-system
  1. Then restart all application pods to get new certificates:
for ns in $(kubectl get namespaces -l istio-injection=enabled -o name); do
  kubectl rollout restart deployment -n ${ns##*/} 2>/dev/null
done

Istio sidecar injection webhook failure

  1. Check the webhook configuration:
kubectl get mutatingwebhookconfigurations istio-sidecar-injector -o yaml
  1. Verify the webhook service is reachable:
kubectl get svc -n istio-system istiod
kubectl get endpoints -n istio-system istiod
  1. If the webhook is down, restart Istiod:
kubectl rollout restart deployment istiod -n istio-system

Prevention

Escalation